NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Checking δ-Satisfiability of Reals with Integrals

https://doi.org/10.1145/3720446

Rivera, Cody; Bhusal, Bishnu; Chadha, Rohit; Sistla, A_Prasad; Viswanathan, Mahesh (April 2025, Proceedings of the ACM on Programming Languages)

Many synthesis and verification problems can be reduced to determining the truth of formulas over the real numbers. These formulas often involve constraints with integrals in them. To this end, we extend the framework of δ-decision procedures with techniques for handling integrals of user-specified real functions. We implement this decision procedure in the tool ∫dReal, which is built on top of dReal. We evaluate ∫dReal on a suite of problems that include formulas verifying the fairness of algorithms and the privacy and the utility of privacy mechanisms and formulas that synthesize parameters for the desired utility of privacy mechanisms. The performance of the tool in these experiments demonstrates the effectiveness of ∫dReal.
more » « less
Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

https://doi.org/10.1109/IPDPS53621.2022.00075

Rivera, Cody; Di, Sheng; Tian, Jiannan; Yu, Xiaodong; Tao, Dingwen; Cappello, Franck (May 2022, The 36th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2022))

More and more HPC applications require fast and effective compression techniques to handle large volumes of data in storage and transmission. Not only do these applications need to compress the data effectively during simulation, but they also need to perform decompression efficiently for post hoc analysis. SZ is an error-bounded lossy compressor for scientific data, and cuSZ is a version of SZ designed to take advantage of the GPU's power. At present, cuSZ's compression performance has been optimized significantly while its decompression still suffers considerably lower performance because of its sophisticated lossless compression step---a customized Huffman decoding. In this work, we aim to significantly improve the Huffman decoding performance for cuSZ, thus improving the overall decompression performance in turn. To this end, we first investigate two state-of-the-art GPU Huffman decoders in depth. Then, we propose a deep architectural optimization for both algorithms. Specifically, we take full advantage of CUDA GPU architectures by using shared memory on decoding/writing phases, online tuning the amount of shared memory to use, improving memory access patterns, and reducing warp divergence. Finally, we evaluate our optimized decoders on an Nvidia V100 GPU using eight representative scientific datasets. Our new decoding solution obtains an average speedup of 3.64X over cuSZ's Huffman decoder and improves its overall decompression performance by 2.43X on average.
more » « less
Full Text Available
Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures

https://doi.org/10.1109/IPDPS49936.2021.00097

Tian, Jiannan; Rivera, Cody; Di, Sheng; Chen, Jieyang; Liang, Xin; Tao, Dingwen; Cappello, Franck (May 2021, The 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2021))

Today's high-performance computing (HPC) applications are producing vast volumes of data, which are challenging to store and transfer efficiently during the execution, such that data compression is becoming a critical technique to mitigate the storage burden and data movement cost. Huffman coding is arguably the most efficient Entropy coding algorithm in information theory, such that it could be found as a fundamental step in many modern compression algorithms such as DEFLATE. On the other hand, today's HPC applications are more and more relying on the accelerators such as GPU on supercomputers, while Huffman encoding suffers from low throughput on GPUs, resulting in a significant bottleneck in the entire data processing. In this paper, we propose and implement an efficient Huffman encoding approach based on modern GPU architectures, which addresses two key challenges: (1) how to parallelize the entire Huffman encoding algorithm, including codebook construction, and (2) how to fully utilize the high memory-bandwidth feature of modern GPU architectures. The detailed contribution is four-fold. (1) We develop an efficient parallel codebook construction on GPUs that scales effectively with the number of input symbols. (2) We propose a novel reduction based encoding scheme that can efficiently merge the codewords on GPUs. (3) We optimize the overall GPU performance by leveraging the state-of-the-art CUDA APIs such as Cooperative Groups. (4) We evaluate our Huffman encoder thoroughly using six real-world application datasets on two advanced GPUs and compare with our implemented multi-threaded Huffman encoder. Experiments show that our solution can improve the encoding throughput by up to 5.0x and 6.8x on NVIDIA RTX 5000 and V100, respectively, over the state-of-the-art GPU Huffman encoder, and by up to 3.3x over the multi-thread encoder on two 28-core Xeon Platinum 8280 CPUs.
more » « less
Full Text Available
Optimizing Error-Bounded Lossy Compression for Scientific Data on GPUs

https://doi.org/10.1109/Cluster48925.2021.00047

Tian, Jiannan; Di, Sheng; Yu, Xiaodong; Rivera, Cody; Zhao, Kai; Jin, Sian; Feng, Yunhe; Liang, Xin; Tao, Dingwen; Cappello, Franck (September 2021, 2021 IEEE International Conference on Cluster Computing (CLUSTER 2021))

Full Text Available
TSM2X: High-performance tall-and-skinny matrix-matrix multiplication on GPUs

https://doi.org/10.1016/j.jpdc.2021.02.013

Rivera, Cody; Chen, Jieyang; Xiong, Nan; Zhang, Jing; Song, Shuaiwen Leon; Tao, Dingwen (February 2021, Journal of Parallel and Distributed Computing)
null (Ed.)
Full Text Available
cuSZ: An Efficient GPU-Based Error-Bounded Lossy Compression Framework for Scientific Data

https://doi.org/10.1145/3410463.3414624

Tian, Jiannan; Di, Sheng; Zhao, Kai; Rivera, Cody; Hickman Fulp, Megan; Underwood, Robert; Jin, Sian; Liang, Xin; Calhoun, Jon; Tao, Dingwen; et al (October 2020, The 29th International Conference on Parallel Architectures and Compilation Techniques (PACT 2020))

Error-bounded lossy compression is a state-of-the-art data reduction technique for HPC applications because it not only significantly reduces storage overhead but also can retain high fidelity for postanalysis. Because supercomputers and HPC applications are becoming heterogeneous using accelerator-based architectures, in particular GPUs, several development teams have recently released GPU versions of their lossy compressors. However, existing state-of-the-art GPU-based lossy compressors suffer from either low compression and decompression throughput or low compression quality. In this paper, we present an optimized GPU version, cuSZ, for one of the best error-bounded lossy compressors-SZ. To the best of our knowledge, cuSZ is the first error-bounded lossy compressor on GPUs for scientific data. Our contributions are fourfold. (1) We propose a dual-quantization scheme to entirely remove the data dependency in the prediction step of SZ such that this step can be performed very efficiently on GPUs. (2) We develop an efficient customized Huffman coding for the SZ compressor on GPUs. (3) We implement cuSZ using CUDA and optimize its performance by improving the utilization of GPU memory bandwidth. (4) We evaluate our cuSZ on five real-world HPC application datasets from the Scientific Data Reduction Benchmarks and compare it with other state-of-the-art methods on both CPUs and GPUs. Experiments show that our cuSZ improves SZ's compression throughput by up to 370.1x and 13.1x, respectively, over the production version running on single and multiple CPU cores, respectively, while getting the same quality of reconstructed data. It also improves the compression ratio by up to 3.48x on the tested data compared with another state-of-the-art GPU supported lossy compressor.
more » « less
Full Text Available

Search for: All records